Skip to content

Enabling index scans for OID cast patterns in system views#4868

Merged
tanscorpio7 merged 6 commits into
babelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:jira-babel-ssms-perf-5x
Jun 22, 2026
Merged

Enabling index scans for OID cast patterns in system views#4868
tanscorpio7 merged 6 commits into
babelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:jira-babel-ssms-perf-5x

Conversation

@RuchaSK1

@RuchaSK1 RuchaSK1 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

Implementing match_opclause_to_indexcol_hook to rewrite expressions like (oid)::integer = value to oid = value::oid, enabling index usage in system catalog views.

Following improvement is observed in SSMS Queries (36k objects):

SSMS Operation Before After Improvement
Expanding table Keys 20 min 6 min 4s 3.3×
Expanding table Indexes 14.5 min 34.6s 25×
View → Indexes 14 min 34.1s 24×
Expanding Tables 1 min 17s 29.7s 2.6×

Cherry-picked from: #4829
Engine PR: babelfish-for-postgresql/postgresql_modified_for_babelfish#774

Issues Resolved

Task: BABEL-6814

Authored-by: Rucha Kulkarni ruchask@amazon.com
Signed-off-by: Rucha Kulkarni ruchask@amazon.com

Engine PR: babelfish-for-postgresql/postgresql_modified_for_babelfish#762

Test Scenarios Covered

Since this is a planner only improvements, we have not added any tests. There are existing tests which check correctness.

  • Use case based -

  • Boundary conditions -

  • Arbitrary inputs -

  • Negative test cases -

  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests -

  • Tooling impact -

  • Client tests -

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@RuchaSK1

Copy link
Copy Markdown
Contributor Author

/code-review

kuntalghosh
kuntalghosh previously approved these changes Jun 19, 2026

@kuntalghosh kuntalghosh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review

Verdict: COMMENT

Based on the provided code changes, here are my feedback and suggestions:

Issue 1 - In contrib/babelfishpg_tsql/src/hooks.c at line 59: #include ordering:

#include "optimizer/paths.h"
#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "access/stratnum.h"
#include "parser/analyze.h"

The #include "access/stratnum.h" is placed after the optimizer/ includes, breaking alphabetical ordering by directory prefix. The project convention groups includes alphabetically by path.

Suggested fix:

#include "access/stratnum.h"
...
#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "optimizer/paths.h"
#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"

Issue 2 - In contrib/babelfishpg_tsql/src/pltsql.h at lines 2523–2541: Unrelated InsertExecContext declarations bundled with an index-scan hook PR:

typedef struct InsertExecContext
{
	Oid			temp_table_oid;
	Oid			target_rel_oid;
	bool		is_target_relation_modified;
} InsertExecContext;

extern InsertExecContext insert_exec_ctx;
extern Oid create_insert_exec_temp_table(...);
extern void flush_insert_exec_temp_table(...);
extern DestReceiver *CreateInsertExecDestReceiver(void);

These declarations expose an INSERT...EXEC API surface that has no corresponding implementation in this PR. No .c file defines these symbols, so they are dead code at this point. Including them here creates a confusing scope for review — the PR title and description focus entirely on the OID cast index-scan optimization (BABEL-6814), yet these declarations belong to a different feature (INSERT...EXEC support).

This is not a blocking issue since the declarations are inert (no linker impact when unreferenced), but it would be cleaner to:

  1. Land them in the PR that introduces the implementation, or
  2. Guard them behind a comment noting the follow-up PR that will use them.

The security review noted that the implementation PR must be carefully reviewed for state-reset on error paths, TOCTOU on target_rel_oid, and authorization checks in CreateInsertExecDestReceiver.

Issue 3 - In test/JDBC/input/BABEL-6814-vu-prepare.sql: Test setup creates 100 tables and 500 schemas for plan stability:

WHILE @i <= 100
BEGIN
    SET @sql = 'CREATE TABLE babel_6814_extra_' + CONVERT(VARCHAR, @i) + ' (id INT)';
    ...
END
...
WHILE @j <= 500
BEGIN
    SET @sql2 = 'CREATE SCHEMA babel_6814_schema_' + CONVERT(VARCHAR, @j);
    ...
END

Creating 100 extra tables + 100 indexes + 500 schemas to force planner cost estimates toward index scans is a heavy-weight test fixture. This significantly increases test runtime (especially in upgrade tests where it runs against every version). Consider whether a smaller cardinality (e.g., 20 tables, 50 schemas) would be sufficient to tip the planner, or whether SET enable_seqscan = off in the psql EXPLAIN section would produce deterministic plans without the bulk object creation.

Issue 4 - Advisory: EXPLAIN plan tests may be fragile across PostgreSQL major versions:

The EXPLAIN (COSTS OFF) tests (Tests 16–22) assert specific plan shapes (e.g., "Index Scan using pg_class_oid_index"). While the hook guarantees the index-scannable rewrite is available, the planner's final plan choice depends on table statistics, catalog size, and cost parameters. The -- parallel_query_expected variant already shows different plan wrappers (Gather nodes). If PostgreSQL's cost model changes across major versions, these tests could produce spurious failures. The existing mitigation (bulk object creation) helps but is not guaranteed for all future PG versions.

Consider adding a comment in the test file noting that these EXPLAIN tests may need updating when PostgreSQL's cost model or parallel query thresholds change.

Missing Tests

No critical gaps. The test coverage is thorough:

  • Both operand orderings (left/right cast patterns)
  • Correlated subqueries, joins, IN-list subqueries
  • Multiple catalog tables (pg_class, pg_index, pg_attribute, pg_namespace)
  • Non-existent OID (zero-row case)
  • Parallel query variant
  • EXPLAIN plan verification

One minor gap: no test for the contain_volatile_functions guard — i.e., a query like WHERE (oid)::int4 = nextval('seq') that should NOT be rewritten to use an index scan. This is a correctness edge case rather than a functional gap since a volatile expression would still produce correct results (just without the optimization).

Notes

  • The PR description references BABEL-6435 (OBJECTPROPERTYEX C rewrite) but this PR only implements BABEL-6814 (OID cast index scans). The description content about OBJECTPROPERTYEX appears to be carried over from the cherry-pick source PR #4829 and is somewhat misleading for this standalone change. The actual code change here is focused and well-scoped.
  • Security review (0 actionable findings): The planner hook is semantically correct — int4 and oid share 32-bit representation, RelabelType is binary-identity, oideq is leakproof, and security_level is properly propagated. The hook correctly chains with prev_match_opclause_to_indexcol_hook and restores it on uninstall.
  • The InsertExecContext declarations in pltsql.h should receive a dedicated security review when their implementation lands.
  • Upgrade schedule files correctly add BABEL-6814 to all version schedules, ensuring the test runs during minor version upgrades.

Note: This AI review is meant to augment, not replace, human review. Use the suggestions with discretion.

Rucha Kulkarni added 2 commits June 20, 2026 14:27
@RuchaSK1 RuchaSK1 force-pushed the jira-babel-ssms-perf-5x branch from 9a738bc to 10be049 Compare June 20, 2026 14:30
Rucha Kulkarni added 4 commits June 21, 2026 05:10
Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
@RuchaSK1

Copy link
Copy Markdown
Contributor Author

/code-review

2 similar comments
@RuchaSK1

Copy link
Copy Markdown
Contributor Author

/code-review

@RuchaSK1

Copy link
Copy Markdown
Contributor Author

/code-review

tanscorpio7 pushed a commit to babelfish-for-postgresql/postgresql_modified_for_babelfish that referenced this pull request Jun 22, 2026
Adding a hook in match_opclause_to_indexcol() that allows babelfish to
provide index clauses when the built-in logic cannot match an OpExpr to
an index column. This enables extensions to handle type-cast patterns
that prevent index usage due to operator family mismatches.

Extension PR: babelfish-for-postgresql/babelfish_extensions#4868

Task: BABEL-6814

Authored-by: Rucha Kulkarni ruchask@amazon.com
@tanscorpio7 tanscorpio7 requested a review from kuntalghosh June 22, 2026 05:03

@kuntalghosh kuntalghosh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review

Verdict: COMMENT

Based on the provided code changes, here are my feedback and suggestions:

Issue 1 - In contrib/babelfishpg_tsql/src/hooks.c at line 59: #include ordering:

#include "optimizer/paths.h"
#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "access/stratnum.h"
#include "parser/analyze.h"

The #include "access/stratnum.h" is placed after the optimizer/ group, breaking the directory-prefix grouping convention used throughout this file. While the file already has some ordering inconsistencies, new additions should follow the convention.

Suggested fix:

#include "access/stratnum.h"
...
#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "optimizer/paths.h"
#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"

Issue 2 - In contrib/babelfishpg_tsql/src/pltsql.h at lines 2523–2541: Unrelated InsertExecContext declarations in an index-scan optimization PR:

typedef struct InsertExecContext
{
    Oid         temp_table_oid;
    Oid         target_rel_oid;
    bool        is_target_relation_modified;
} InsertExecContext;

extern InsertExecContext insert_exec_ctx;
extern Oid create_insert_exec_temp_table(...);
extern void flush_insert_exec_temp_table(...);
extern DestReceiver *CreateInsertExecDestReceiver(void);

These declarations expose an INSERT...EXEC API surface that has no corresponding implementation in this PR — no .c file defines these symbols. Including dead forward declarations in a PR focused on OID cast index-scan optimization (BABEL-6814) confuses the review scope. They are inert (no linker impact when unreferenced), but would be cleaner landing in the PR that introduces their implementation, or guarded by a comment noting the follow-up PR.

The security review confirmed these are harmless in isolation, but flagged that the future implementation PR must be carefully reviewed for: SQL injection in create_insert_exec_temp_table (all const char * params imply dynamic DDL), stale OID exposure from the global insert_exec_ctx on transaction abort, and DestReceiver lifecycle on error paths.

Issue 3 - In test/JDBC/input/BABEL-6814-vu-prepare.sql: Heavy test fixture (100 tables + 500 schemas) for plan stability:

WHILE @i <= 100
BEGIN
    SET @sql = 'CREATE TABLE babel_6814_extra_' + CONVERT(VARCHAR, @i) + ' (id INT)';
    ...
END
...
WHILE @j <= 500
BEGIN
    SET @sql2 = 'CREATE SCHEMA babel_6814_schema_' + CONVERT(VARCHAR, @j);
    ...
END

Creating 100 extra tables, 100 indexes, and 500 schemas to force planner cost estimates toward index scans is a heavyweight fixture. This adds significant runtime to every upgrade test version (60+ schedules now include BABEL-6814). Consider whether a smaller cardinality (e.g., 20 tables, 50 schemas) would be sufficient to tip the planner, or whether SET enable_seqscan = off in the psql EXPLAIN section would produce deterministic plans without the bulk object creation.

Issue 4 - Advisory: EXPLAIN plan tests may be fragile across PostgreSQL major versions:

The EXPLAIN (COSTS OFF) tests (Tests 16–22) assert specific plan shapes like "Index Scan using pg_class_oid_index". While the hook guarantees the rewrite is available, the planner's final choice depends on table statistics and cost parameters. The -- parallel_query_expected variant already shows different plan wrappers (Gather nodes). If PostgreSQL's cost model changes across major versions, these tests could produce spurious failures.

Consider adding a brief comment in the test file noting these EXPLAIN tests may need updating when PostgreSQL's cost model or parallel query thresholds change.

Missing Tests

No critical gaps. The test coverage is thorough:

  • Both operand orderings (left/right cast patterns)
  • Correlated subqueries, joins, IN-list subqueries
  • Multiple catalog tables (pg_class, pg_index, pg_attribute, pg_namespace)
  • Non-existent OID (zero-row case)
  • Parallel query variant
  • EXPLAIN plan verification

One minor gap: no test for the contain_volatile_functions guard — i.e., a query like WHERE (oid)::int4 = nextval('seq') that should NOT be rewritten. This is a correctness edge case rather than a functional gap since a volatile expression would still produce correct results (just without the optimization).

Notes

  • Prior review issues addressed: The previous AI review (DISMISSED) raised the same Issues 1–4 above. They remain advisory and have not been addressed in subsequent commits. The human reviewer (tanscorpio7) has approved the PR.
  • Security review (0 findings): The planner hook is semantics-preserving — int4 and oid share 32-bit representation, RelabelType is binary-identity, oideq is leakproof, and security_level is properly propagated. The hook correctly chains with prev_match_opclause_to_indexcol_hook and restores it on uninstall. No volatile functions bypass, no pointer aliasing, no resource leaks.
  • The InsertExecContext declarations should receive a dedicated security review when their implementation lands.
  • The 1_GRANT_SCHEMA-vu-verify.mix change adds ANALYZE on system catalogs at the start of the test to stabilize plan choices — this is reasonable and helps prevent test flakiness.
  • Upgrade schedule files correctly add BABEL-6814 to all version schedules, ensuring the test runs during minor version upgrades.
  • Cherry-pick context: This PR is cherry-picked from PR #4829 which targets BABEL_6_X_DEV. The target branch BABEL_5_X_DEV is correct for the 5.x backport.

Note: This AI review is meant to augment, not replace, human review. Use the suggestions with discretion.

@tanscorpio7 tanscorpio7 merged commit dbb532c into babelfish-for-postgresql:BABEL_5_X_DEV Jun 22, 2026
47 checks passed
@tanscorpio7 tanscorpio7 deleted the jira-babel-ssms-perf-5x branch June 22, 2026 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants